Breaking CAPTCHAs on the Dark Web
ثبت نشده
چکیده
On the Dark Web, several websites inhibit automated scraping attempts by employing CAPTCHAs. Scraping important content from a website is possible if these CAPTCHAs are solved by a web scraper. For this purpose, a Machine Learning tool is used, TensorFlow and an Optical Character Recognition tool, Tesseract to solve simple CAPTCHAs. Two sets of CATPCHAs, which are also used on some Dark Web websites, were generated for testing purposes. Tesseract achieved a success rate of 27.6% and 13.7% for set 1 and 2, respectively. A total of three models were created for TensorFlow. One model per set of CAPTCHAs and one model with the two sets mixed together. TensorFlow achieved a success rate of 94.6%, 99.7%, and 70.1% for the first, second, and mixed set, respectively. The initial investment to train TensorFlow can take up to two days to train for a single type of CAPTCHA, depending on implementation efficiency and hardware. The CAPTCHA images, including the answers, are also a requirement for training TensorFlow. Whereas Tesseract can be used on-demand without need for prior training.
منابع مشابه
Breaking Audio CAPTCHAs
CAPTCHAs are computer-generated tests that humans can pass but current computer systems cannot. CAPTCHAs provide a method for automatically distinguishing a human from a computer program, and therefore can protect Web services from abuse by so-called “bots.” Most CAPTCHAs consist of distorted images, usually text, for which a user must provide some description. Unfortunately, visual CAPTCHAs li...
متن کاملDecaptcha: Breaking 75% of eBay Audio CAPTCHAs
CAPTCHA tests aim at preventing attackers from performing automatic website registration. In this paper we show that our prototype Decaptcha is able to successfully break 75% of eBay audio captchas. We compare its performance with the state of the art, readily available speech recognition system Sphinx and discuss the implications for eBay security.
متن کاملThe AI Hardness of CAPTCHAs does not imply Robust Network Security
A CAPTCHA is a special kind of AI hard test to prevent bots from logging into computer systems. We define an AI hard test to be a problem which is intractable for a computer to solve as a matter of general consensus of the AI community. On the Internet, CAPTCHAs are typically used to prevent bots from signing up for illegitimate email accounts or to prevent ticket scalping on e-commerce web sit...
متن کاملSEIMCHA: a new semantic image CAPTCHA using geometric transformations
As protection of web applications are getting more and more important every day, CAPTCHAs are facing booming attention both by users and designers. Nowadays, it is well accepted that using visual concepts enhance security and usability of CAPTCHAs. There exist few major different ideas for designing image CAPTCHAs. Some methods apply a set of modifications such as rotations to the original imag...
متن کاملAccessible Voice CAPTCHAs for Internet Telephony
CAPTCHAs have become a pervasive method for protecting against automated submissions to web forums and registration to web based email services. The CAPTCHAs are usually image-based, but voice CAPTCHAs have also emerged as an alternative. In this short note, we discuss our ongoing efforts on designing accessible voice CAPTCHAs for Internet Telephony. We have implemented a testbed for Skype to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018